-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-44615: [C++][Compute] Add extract_regex_span function #45577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
228c2c5 to
88da027
Compare
|
@pitrou could you review this issue? |
mapleFU
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the compute document also need to add this?
Yes, this is a new compute function called extract_regex_span. Should I add changes somewhere? |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @arashandishgar . There are a number of things to improve in this PR even though the overall implementation is sound, see comments below.
|
Appreciate your feedback! I’ll look into it and follow up soon. |
29ff6ea to
9442b40
Compare
9442b40 to
2b82534
Compare
|
@pitrou I've applied all changes that you suggest. Could You review again? |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. Here are some more comments.
Also, can you add some documentation to https://github.com/apache/arrow/blob/main/docs/source/cpp/compute.rst#string-component-extraction?
|
Thanks for your review, I'll look into it |
Thanks for mentioning it; I wrote the relevant documentation. |
2b82534 to
229ed3b
Compare
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @arashandishgar . Here are a couple more comments, can you take a look and address them?
|
Thanks for your comment. I will look into it |
d6c52ea to
2764dc9
Compare
2764dc9 to
703a5ac
Compare
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made some minor changes and added Python bindings. I'll merge this PR if CI is green, thank you @arashandishgar !
|
@github-actions crossbow submit -g cpp |
|
Revision: 703a5ac Submitted crossbow builds: ursacomputing/crossbow @ actions-4358c6f365 |
I wanted to take a moment to sincerely thank you for your patience and constructive feedback during the code review process. As this was my first experience contributing to Apache Arrow, I learned a lot throughout the review, and your thoughtful suggestions were extremely valuable. I realized that I made a mistake by not properly generalizing your suggestion across the entire codebase, especially in my second commit. In my rush to submit the pull request, I overlooked that point. I truly appreciate your understanding, and I hope to be more diligent in applying feedback consistently in future contributions. Thanks again for your time and support. |
|
You're welcome, @arashandishgar . Feel free to contribute again! |
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 0494115. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 4 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
While the
extract_regexfunction returns substrings of the matching regex captures,extract_regex_spanreturns (index, length) pairs of these substrings relative to the original string values.Are these changes tested?
Yes, by dedicated unit tests.
Are there any user-facing changes?
No, except a new compute function.
compute.extract_regex#44615